An Improved Algorithm for Finding Linked Subsets in the Presence of Heterogeneity or Interactions
نویسنده
چکیده
Identifying regions of DNA that contribute to disease becomes more difficult as the diseases become more complex. Traits such as blood pressure or heart disease have complex etiologies involving multiple genes or gene by environment interactions. In these cases, many current models are too simple to detect the effects of genes. Recently, Shannon et al. proposed the use of recursive partitioning (RP) techniques to handle such situations. These methods can separate a sample into increasingly similar subgroups based on some criterion – typically an improved fit to a regression model. The “tree linkage” method employed by Shannon et al. was based on improving the regression fit for standard Haseman-Elston regression. In HasemanElston regression, the magnitude of the regression slope is the indicator of association between a trait and a region of DNA. Therefore, it was proposed that tree linkage algorithm would be improved by changing the method of partitioning the data to reflect this. Specifically, basing the partitioning criterion on the regression slope (and ignoring the intercept) should improve type I error rates. Simulation results comparing the original algorithm with one based on slope terms showed greater bias in the distribution of empirical p-values in the original algorithm. Furthermore, a comparison of performance under a specific alternative showed that the original algorithm had slightly less favorable distributions of linkage group impurity and tree specificity. Both algorithms performed similarly with respect to sensitivity and kappa, although there was a slight improvement in kappa for the improved algorithm.
منابع مشابه
The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملWell-dispersed subsets of non-dominated solutions for MOMILP problem
This paper uses the weighted L$_1-$norm to propose an algorithm for finding a well-dispersed subset of non-dominated solutions of multiple objective mixed integer linear programming problem. When all variables are integer it finds the whole set of efficient solutions. In each iteration of the proposed method only a mixed integer linear programming problem is solved and its optimal solutions gen...
متن کاملLoad Model Effect Assessment on Optimal Distributed Generation Sizing and Allocation Using Improved Harmony Search Algorithm
The operation of a distribution system in the presence of distributed generation systems has someadvantages and challenges. Optimal sizing and siting of DG systems has economic, technical, andenvironmental benefits in distribution systems. Improper selection of DG systems can reduce theseadvantages or even result in deterioration in the normal operation of the distribution system. DGallocation ...
متن کاملFeature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine
Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods. In filter methods, features subsets are selected due to some measu...
متن کاملPresenting an evolutionary improved algorithm for the multi-objective problem of distribution network reconfiguration in the presence of distributed generation sources and capacitor units with regard to load uncertainty.
Reconfiguration of distribution network feeders is one of the well-known and effective strategies in the distribution network to obtain a new optimal configuration for the distribution feeders by managing the status of switches in the distribution network. This study formulates the multi-objective problem of reconfiguration of a distribution network in the optimal presence of distributed genera...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008